Llama 3.2 90B Vision Instruct
Llama 3.2-Vision is a multimodal large language model developed by Meta, supporting image and text input with text output, excelling in visual recognition, image reasoning, image captioning, and visual question answering tasks.
Image-to-Text
Transformers Supports Multiple Languages